Learning to Recognize Promoter Sequences in E. coli by Modeling Uncertainty in the Training Data
نویسنده
چکیده
Automatic recognition of promoter sequences is an important open problem in molecular biology. Unfortunately, the usual machine learning version of this problem is critically flawed. In particular, the dataset available from the Irvine repository was drawn from a compilation of promoter sequences that were preprocessed to conform to the biologists’ related notion of the corrserzsUs sequence, a first-order approximation with a number of shortcomings that are well-known in molecular biology. Although concept descriptions learned from the Irvine data may represent the consensus sequence, they do not represent promoters. More generally, imperfections in preprocessed data and statistical variations in the locations of biologically meaningful features within the raw data invalidate standard attribute-based approaches. I suggest a dataset, a concept-description language, and a model of uncertainty in the promoter data that are all biologically justified, then address the learning problem with incremental probabilistic evidence combination. This knowledge-based approach yields a more accurate and more credible solution than other more conventional machine learning systems.
منابع مشابه
Expression Cloning of Recombinant Escherichia coli lacZ Genes Encoding Cytoplasmic and Nuclear P-galactosidase Variants
Objective(s) Nonviral vector can be an attractive alternative to gene delivery in experimental study. In spite of some advantages in comparison with the viral vectors, there are still some limitations for efficiency of gene delivery in nonviral vectors. To determine the effective expression, the recombinant Escherichia coli lacZ genes were cloned into the different variants of pcDNA3.1 and the...
متن کاملMedical school faculty Members and students Perceptions of Challenges to online learning during corona pandemic: Qualitative content analysis
Introduction: It is clear that The Covid-19 pandemic has disrupted an education system like other areas of society. The current research aimed to investigate faculty members' and medical students’ perceptions towards E-learning challenges during the Covid-19. Methods: This qualitative study was performed by purposive sampling among students (N=16) and faculty members (N=6) in Sabzevar Universit...
متن کاملCloning and expression of rhl AB operon under the control of tac promoter in E. coli
Today, efforts go towards the replacement of chemical surfactants by natural biological biosurfactants (biosurfactant), as these materials are not carcinogenic and highly compatibile with the environment. One of the main classes of biosurfactants is rhamnose containing glycolipid biosurfactant (rhamnolipids). This type of biosurfactants can be applied in many industries such as oil industry, ph...
متن کاملThe Study of Affecting Factors on the E-Learning Readiness in National Petrochemical Company Employees
The aim of this study is investing of the affecting factors on organizational learning readiness on national petrochemical co. staff. The results of this research is using implement in training processes. E-learning is a special type of distance learning that relies on network-based technologies and using transmission media such as the Internet or intranet and also multimedia tools, in an inter...
متن کاملEvaluating machine learning methods and satellite images to estimate combined climatic indices
The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994